WORLD INTELLECTUAL PROPERTY ORGANIZAHON 
bitemational Bureau 




PCX 

INTERNATIONAL APPUCATION PUBUSHED UNDER THE PATENT CX)OPERATION TREATY (PCT) 



(51) International Patent Classification 7 : 
H04N 7/30 



A2 



(11) International Publication Number: WO 00/05898 

(43) International Publication Date: 3 February 2000 (03.02.00) 



(21) International Application Number: PCT/US99/I6638 

(22) International FiHng Date: 21 July 1999 (21.07.99) 



(30) Priority Data: 
60/093,860 
09/169,829 



23 July 1998 (23:07.98) US 
1 1 October 1998 (1 1 .10.98) US 



(63) Related by Continuation (CON) or Continuation-in-Part 
(CIP) to Earlier Applications 

US 60/093,860 (CIP) 

Filed on 23 July 1998 (23.07.98) 

US 09/169,829 (CIP) 

Filed on 1 1 October 1998 (1 1 .10.98) 



(81) Designated States: AE, AL, AM. AT, AU, AZ, BA, BB, BG, 
BR. BY, CA, CH, CN, CU, CZ, DE, DK, EE, ES, FI, GB, 
GD, GE, GH, GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, 
KP, KR, KZ, LC, LK, LR, LS, LT, LU, LV, MD. MG, MK, 
MN, MW, MX, NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, 
SK, SL, TJ, TM, TR, TT, UA, UG, US, UZ, VN, YU, ZA, 
ZW, ARIPO patent (GH, GM, KE, LS, MW, SD, SL, SZ, 
UG, ZW), Eurasian patent (AM, AZ, BY, KG, KZ, MD, 
RU, TJ, TM), European patent (AT, BE, CH, CY, DE, DK, 
ES, n, FR, GB, GR, IE, FT, LU, MC, NL, PT, SE), OAPI 
patent (BF, BJ, CF, CG. CI. CM, GA, GN, GW, ML, MR, 
NE. SN, TD, TG). 



Published 

Without international search report and to be republished 
upon receipt of that report. 



(71) Applicant (for all designated States except US); OPTIVISION, 

INC. [US/US]; 3450 Hillvicw Avenue, Palo Alto, CA 94304 
(US). 

(72) Inventor; and 

(75) Inventor/AppUcant (for US only): LI, Weiping [US/US]; 159 
California Avenue, J103. Palo Alto, CA 94306 (US). 

(74) Agent: DAVIS, Paul; Wilson Sonsini Goodrich & Rosati, 650 
Page MiU Road, Palo Alto. CA 94304-1050 (US). 



(54) TiUc: SCALABLE VIDEO CODING AND DECODING 



ORIGrNAL 
VIDEO 
INPUT 
20 



ENHANCEMENT LAYER 
ENCODER 


• 


40 


2 ^ 


t 


ENHANCEMENT ^ 

errsTREAM 


BASE LAYER 


ENCODER 
30 


BASE LAYER 
BITSTREAM 



MUX 
OR 
SERVER 

SO 



CHANNEL 
60 



RECONSTRUCTED 
VIDEO ^ 
OUTPUT 
100 



ENHANCEMENT LAYER 
DECODER 
60 



BASE LAYER 
DECODER 
90 



ENHANCEMENT 
BITSTREAM 



B ASEU YER 
BITSTREAM 



OEMUX 
70 



(57) Abstract 



A video encoding method and apparatus for adapting a video input to a bandwidth of a transmission channel of a network that 
includes determining the number N enhancement layer bitstreams capable of being adapted to the bandwidth of the transmission channel 
of a networlc. A base layer bitstream is encoded from the video input wherein a plurality of enhancement layer bitstreams are encoded 
from the video input. The enhancement layer bitstreams are based on the base layer bitstream, wherein the plurality of enhancement layer 
bitstreams complements the base layer bitstream and the base layer bitstream and N enhancement layer bitstreams are transmitted to the 
netwoiic. 
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SCALABLE VIDEO CODING AND DECODING 

BACKGRQUWP OF THP INVENT IOy 

Field ofth^ Inyeptipq 

The present invention relates to a method and apparatus for the scaling of data 
5 signals the bandwidth of the transmission channel; and more particularly to a scalable 
video method and apparatus for coding video such that the received video is adapted 
to the bandwidth of the transmission channel. 

Descriptioii of Related Art 

10 

Signal compression in the video arena has long been employed to increase the 
bandwidth of either the generating, transmitting, or receiving device. MPEG - an 
acronym for Moving Picture Exp^ Group - refers to the family of digital video 
compression standards and file formats developed by the group. For instance, the 

15 MPEG-1 video sequence is an ordered stream of bits, with special bit patterns marking 
the beginning and ending of a logical section. 

MPEG achieves high compression rate by storing only the changes from one 
frame to another, instead of each entire frame. The video information is then encoded 
using a technique called DCT (Discrete Cosine Transform) which is a technique for 

20 representing a waveform data as a weighted sum of cosines. MPEG use a type of 
lossy compression wherein some data is removed. But the diminishment of data is 
generally hnperceptible to the human eye. It should be noted that the DCT itself does 
not lose data; rather, data compression technologies that rely on DCT approximate 
some of the coefficients to reduce the amount of data. 

25 The basic idea behind MPEG video compression is to remove spatial 

redundancy within a video frame and temporal redundancy between video frames. The 
DCT-based (Discrete Cosine Transform) compression is used to reduce spatial 
redundancy and motion compensation is used to exploit temporal redundancy. The 
images in a video stream usually do not change much within small time intervals. 
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Thus, the idea of motion-compensation is to encode a video frame based on other 
video frames temporally close to it 

A video stream is a sequence of video fi^ies, each frame being a still image. 
A video player displays one frame after anotiier, usually at a rate close to 30 frames per 
second. Macroblocks are formed, each macroblock consists of four 8x8 luminance 
blocks and two 8x8 chrominance blocks. Macroblocks are the units for 
motion-compensated compression, wherein blocks are basic unit used for DCT 
compression. Frames can be encoded in three types: intra-frames (I-fimies), forward 
predicted frames (P-frames), and bi-directional predicted frames (B-frames). 

An I-fi^e is encoded as a single image, with no reference to any past or future 
fiMies, Each 8x8 block is encoded independentiy, except that the coefficient in the 
upper left comer of the block, called the DC coeflScient, is encoded relative to the DC 
coefficient of the previous block. The block is first transformed from the spatial 
domain into a fiiequency domain using the DCT (Discrete Cosine Transform), which 
separates the signal into independent frequency bands. Most frequency information is 
in the upper left comer of the resulting 8x8 block. After the DCT coefficients are 
produced the data is quantized, i.e. divided or separated. Quantization can be thought 
of as ignoring lower-order bits and is the only lossy part of the whole compression 
process other than sub-sampling. 

The resulting data is then run-length encoded in a zig-zag ordering to optimize 
compression. The zig-zag ordering produces longer runs of O's by taking advantage of 
the fact that there should be littie high-frequency information (more O's as one zig-zags 
from the upper left comer towards the lower right comer of the 8x8 block). 

A P-frame is encoded relative to the past reference frame. A reference frame is 
aP-orl-frame. The past reference fiame is the closest preceding reference frame. A 
P-macroblock is encoded as a 1 6 x 1 6 area of the past reference fi^e, plus an error 
term. 

To specify tiie 1 6 x 1 6 area of the reference frame, a motion vector is included, 
A motion vector (0, 0) means fliat the 1 6 x 1 6 area is in the same position as the 
macroblock we are encoding. Other motion vectors are generated are relative to that 
position. Motion vectors may include half-pixel values, in which case pixels are 
averaged. The error term is encoded using the DCT, quantization, and run-length 
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encoding. A macroblock may also be skipped which is equivalent to a (0, 0) vector 
and an all-zero error term. 

A B-fiame is encoded relative to the past refermce fiame, the future reference 
frame, or botii frames. 

5 A pictorial view of the above processes and techniques in application are 

depicted in prior art Fig. 15, which illustrates the decodmg process for a SNR 
scalability. Scalable video coding means coding video in such a way that the quality of 
a received video is adapted to the bandwidth of the transmission channel. Such a 
coding technique is very desirable for transmitting video over a network with a time- 

1 0 varying bandwidth. 

SNR scalability defines a mechanism to refine the DCT coefBcients encoded in 
another (lower) layer of a scalable hierarchy. As illustrated in prior art Fig. 15, data 
from two bitstreams is combined after the inverse quantization processes by adding 
the DCT coefficients. Until the dat is combined, the decoding processes of the two 

15 layers are independent of each other. 

The lower layer (base layer) is derived from the first bitstream and can itself be 
either non-scalable, or require the spatial or temporal scalability decoding process, and 
hence the decoding of additional bitstream, to be applied. The enhancement layer, 
derived from the second bitstream, contains mainly coded DCT coefficients and a small 

20 overhead. 

In the current MPEG-2 video coding standard, there is an SNR scalability 
extension that allows two levels of scalability. MPEG achieves high compression rate 
by storing only the changes from one fiame to another, mstead of each entire frame. 
There are at least two disadvantages of employing the MPEG-2 standard for encoding 

25 video data. One disadvantage is that the scalability granularity is not fine enough, 

because the MPEG-2 process is an all or none method. Either the receiving device can 
receive all of the data fixjm the base layer and the enhancement layer or only the data 
fix)m the base layer bitstream. Therefore, the granularity is not scalable. In a network 
environment, more than two levels of scalability are usually needed. 

30 Another disadvantage is that the enhancement layer coding m MPEG-2 is not 

efficient Too many bits are needed in the enhancement layer in order to have a 
noticeable increase in video quality. 
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The present invention overcomes these disadvantages and others by providing, 
among other advantages, an efficient scalable video coding method with increased 
granularity. 

SUMMARY OF THE INVENTTON 

The present invention can be characterized as a scalable video coding means 
and a system for encoding video data, such that quality of the final image is gradually 
improved as more bits are received. The unproved quality and scalability are achieved 
by a method wherein an enhancement layer is subdivided into layers or levels of 
bitstream layers. Each bitstream layer is capable of canying infonnation 
complemmtaiy to the base layer information, in that as each of the enhancement layer 
bitstreams are added to the corresponding base layer bitstreams the quality of the 
resulting images are improved. 

The number N of enhancement layers is deteimined or Ihnited by the network 
that provides the transmission channel to the destination point While the base layer 
bitstream is always transmitted to the destination point, the same is not necessarily true 
for the enhancement layers. Each layer is given a priority coding and transmission is 
effectuated according to the priority coding, hi the event that all of the enhancement 
layers cannot be transmitted the lower priority coded layers will be omitted. The 
omission of one or more enhancement layers may be due to a multitude of reasons. 

For instance, the server which provides the transmission channel to the 
destination point may be experiencing large demand on its resources from other users, 
in order to try and accommodate all of its users the server will prioritize the data and 
only transmit the higher priority coded packets of information. The transmission 
channel may be the limiting factor because of the bandwidth of the channel, i.e. 
hitemet access port, Ethemet protocol, LAN, WAN, twisted pan- cable, co-axial cable, 
etc. or the destmation device itself, i.e. modem, absence of an enhanced video card, 
etc. may not be able to receive the additional bandwidth made available to it. hi these 
mstances only M number (M is an integer number = 0, 1 , 2, . . .) of enhancement layers 
may be received, wherein N number (N is an integer number = 0, 1, 2, . . .) of 
enhancement layers were generated at the encoding stage, M < N. 
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To achieve these and other advantages and in accordance with the purpose of 
the present invention, as embodied and broadly described, the scalable video method 
and apparatus according to one aspect of the invention includes a video encoding 
method for adapting a video input to a bandwidth of a transmission channel of a 
network, the method includes determining the number N of enhancement layer 
bitstreams capable of being adapted to the bandwidth of the transmission channel of 
the network. Encoding a base layer bitstream from the video input is then performed 
and encoding N number of enhancement layer bitstreams from the video input based on 
the base layer bitstream, wherein the plurality of enhancement layer bitstreams 
complements the base layer bitstream. The base layer bitstream and the N 
enhancement layer bitstreams are then provided to the network. 

According to another aspect of the present invention, a video decoding method 
for adapting a video input to a bandwidth of a transmission channel of a network 
includes, determining number M of enhancement layer bitstreams of said video input 
capable of being received from said transmission channel of said network. Decoding a 
base layer bitstream from received video input and decoding M number of 
enhancement layer bitstreams from the received video input based on the base layer 
bitstream, wherein the M received enhancement layer bitstreams complements the base 
layer bitstream. Then reconstructing the base layer bitstream and N enhancement layer 
bitstreams. 

According to still another aspect of the present invention, a video decoding 
method for adapting a video input to a bandwidth of a receiving apparatus, the method 
includes demultiplexing a base layer bitstream and at least one of a plurality of 
enhancement layer bitstreams received from a network, decodmg the base layer 
bitstream, decoding at least one of the plurality of enhancement layer bitstreams based 
on generated base layer bitstream, wherein the at least one of the plurality of 
enhancement layer bitstreams enhances the base layer bitstream. Then reconstructing a 
video output. 

According to a further aspect of the present invention, a video encoding 
method for encoding enhancement layers based on a base layer bitstream encoded from 
a video input, the video encoding method includes, taking a difference between an 
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original DCT coeflBcient and a reference point and dividing the diflference between the 
origmal DCT coefBcient and the reference point into N bit-planes. 

According to a still further aspect of the present invention, a method of coding 
motion vectors of a plurality of macroblocks, includes determining an average motion 
vector from N motion vectors for N macroblocks, utilizing the determined average 
motion vector as the motion vector for the N macroblocks, and encoding 1/N motion 
vectors in a base layer bitstream. 

Additional features and advantages of the invention will be set forth in the 
description which foUows, and in part will be apparent from the description, or may be 
leamed by practice of the invention. The aspects and other advantages of the invention 
will be realized and attained by the structure particularly pointed out in the written 
description and claims hereof as well as the appended drawings. 

It is to be understood that both the foregoing general description and the 
following detailed description are exemplary and explanatory and are intended to 
provide further explanation of the invention as claimed. 

BRIEF DESCRIPTION OF THE PR AWTN^S 

The accompanying drawings, which are included to provide a further 
understanding of the invention and are incorporated in and constitute a part of this 
specification, illustrate embodiments of the invention and together with the description 
serve to explain the principles of the invention. In the drawings: 

Fig. 1 illustrates a flow diagram of the scalable video encoding method of the 
present invention; 

Fig. 2A illustrates conventional probability distribution of DCT coefBcient 

values; 

Fig. 2B illustrates conventional probability distribution of DCT coeflBcient 
residues; 

Fig. 3A aiustrates the probability distribution of DCT coeflBcient values of the 
present invention; 

. Fig. 3B illustrates the probability distribution of DCT coeflBcient residues of the 
present invention; 
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Figs. 3C and 3D illustrates a method for taking a difference of a DCT 
coefficient of the present invention; 

Fig. 5 illustrates a flow diagram for finding the maximum number of bit-planes 
in the DCT differences of a fi:ame of the present invention; 
5 Fig. 6 illustrates a flow diagram for generating (RUN, EOF) Symbols of the 

present invention; 

Fig. 7 Illustrates a flow diagram for encoding enhancement layers of the 
present invention; 

Fig. 8 illustrates a flow diagram for encoding (RUN, EOF) symbols and 
1 0 sign_enh values of one DCT block of one bit-plane; 

Fig. 9 illustrates a flow diagram for encoding a sign_enh value of the present 
invention; 

Fig. 1 0 illustrates a flow diagram for adding enhancement difference to a DCT 
coefficient of the present invention; 
IS Fig. 1 1 illustrates a flow diagram for converting enhancement difference to a 

DCT coefficient of the present invention; 

Fig. 12 illustrates a flow diagram for decoding enhancement layers of the 
present invention; 

Fig. 13 illustrates a flow diagram for decoding (RUN, EOF) symbols and 
20 sign_enh values of one DCT block of one bit-plane; 

Fig. 14 illustrates a flow diagram for decoding a sign enh value; and 
Fig. 15 illustrates a prior a conventional SNR scalability flow diagram. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
25 Reference will now be made in detail to the preferred embodiments of the 

present invention, examples of ^^iiich are illustrated in the accompanying drawings. 

Fig. 1 illustrates the scalable video diagram 10 of an embodhnent of the present 
invention. The original video input 20 is encoded by the base layer encoder 30 in 
accordance with the method of represent by flow diagram 400 of Fig. 4. A DCT 
30 coeflBcient OC and its corresponding base layer quantized DCT coefficient QC are 
generated and a difference determined pursuant to steps 420 and 430 of Fig. 4. The 
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dififerwice infoniiation from the base layer encoder 30 is passed to the enhancement 
layer encoder 40 that encodes the enhancement information. 

The encoding of the enhancement layer encoder is performed pursuant to 
methods 500 - 900 as depicted in Figs. 5-10, respectively and will be briefly 
described. The bitstream from the base layer encoder 30 and the N bitstreams from the 
enhancement layer encoder 40 are capable of being sent to the transmission channel 60 
by at least two meftods. 

In the first method all bitstreams are multiplexed together by multiplexor 50 
with different priority identifiers, e.g., the base layer bitstream is guaranteed, 
enhancement bitstream layer 1 provided by enhancement layer encoder 40 is given a 
higher priority than enhancement bitstream layer 2. The prioriti2ation is continued 
until all N (wherein N is an integer from 0, 1, 2, ... ) of the bitstreams layers are 
prioritized. Logic in the encoding layers 30 or 40 in negotiation with the network and 
intermediated devices determine the number N of bitstream layers to be generated. 

The number of bitstream layers generated is a function of the total possible 
bandwidth of the transmission channel 60, i.e. Ethernet, LAN, or WAN connections 
(this list is not intended to exhaustive but only representation of potential limitmg 
devices and/or equipment), and the network and other intermediate devices. The 
number of bitstream layers M (wherein M is an integer and M < N) reaching the 
destination pomt 100 can be fiirther limited by not just the physical constraints of the 
intemiediate devices but the (congestion on the network, thereby necessitating the 
dropping of bitstream layers according to their priority. 

hi a second method the server 50 knows the transmission channel 60 condition, 
i.e. congestion and other physical constramts, and selectively sends the bitstreams to 
the channel according to the priority identifiers. In either case, the destmation pomt 
1 00 receives the bitstream for the base layer and M bitstreams for the enhancement 
layer, v4iercM<N. 

The bitstreams M are sent to the base layer 90 and enhancement layer 80 
decoders after being demultiplexed by demultiplexor 70. The decoded enhancement 
infonnation from the enhancement layer decoder is passed to the base layer decoder to 
composite the reconstructed video output 100. The decoding of the multiplexed 
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bitstreams are accomplished pursuant to the methods and algorithms depicted in flow 
diagrams 1 100 - 1400 of Figs. 11-14, respectively. 

The base layer encoder and decoder are capable of performing logic pursuant 
to the MPEG-1 , MPEG-2, or MPEG-4 (Version-l) standards that are hereby 
5 incorporated by reference into this disclosure. 

Taking Residue with Probability Distribution Preserved 

A detailed description of the probability distribution residue will now be made 
with reference to Figs 2A - 3B 

10 In the current MPEG-2 signal-to-noise ratio (SNR) scalability extensioii, a 

residue or difference is taken between the original DCT coefficient and the quantized 
DCT coefiicient Fig. 2A illustrates the distribution of a residual signal as a DCT 
coeflBcient. In taking the residue small values have higher probabilities and large 
values have smaller probabilities. The intervals along the horizontal axis represent 

15 quantization bins. The dot in the center of each interval represents the quantized DCT 
coefficient Taking the residue between the original and the quantized DCT coefficient 
is equivalent to moving the origin to the quantization point. 

Therefore, the probability distribution of the residue becomes that as shown in 
Figure 2B. The residue from the positive side of Fig. 2A has a higher probability of 

20 being negative than positive and the residue taken from the negative side of the Fig. 2A 
has a higher probability of being positive than negative. The result is that the 
probability distribution of the residue becomes ahnost xmiform. Thus making coding 
the residue more difficult. 

A vastly superior method is to generate a difference between the original and 

25 the lower boundary points of the quantized interval as shown in Fig. 3 A and Fig. 3B. 

In this method, the residue is taken from the positive side of Fig. 2 A remains positive 
and the residue from the negative side of Fig. 2 A remains negative. Taking the residue 
is equivalent to moving the origin to the reference point as illustrated in Fig. 3 A. Thus, 
the probability of the residue becomes as shown in Fig. 3B. This method preserves the 

30 shape of the original non-uniform distribution. Although the dynamic range of the 
residue taken in such a manner seems to be twice of that depicted in fig. 23, their is 
no longer a need to code the sign, i.e. • or +, of the residue. The sign of the residue is 
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encoded in the base layer bitstream corresponding the enhancement layer, therefore 
this redundancy is eliminated and bits representing the sign are thus saved. Therefore, 
there is only a need to code the magnitude that still has a nonunifoim distribution. 

Bit Plane codiny of residual DCT gneffirianta 

After taking residues of all the DCT coeflBcients in an 8 x 8 block, bit plane 
coding is used to code &e residue. In bit-plane coding method the bit-plane coding 
method considers each residual DCT coefficient as a binary number of several bits 
instead of as a decimal integer of a certain value as in the run-level coding method. 
The bit-plane coding mediod in the present invention only replaces runlevel coding 
part Therefore, all the other syntax elements remain the same. 

An example of and description of the bit-plane coding method will now be 
made, wherein 64 residual DCT coefBcients for an Inter-block and 63 residual DCT 
coefficients for an Ihtra-block (excluding the Intra-DC component fliat is coded using 
a sq)arate mefliod) are utilized for the example. The 64 (or 63) residual DCT 
coefficients are ordered into a one-dimensional array and at least one of the residual 
coefficients is non-zero. The bit-plane coding method then performs the following 
steps. 

The maximum value of all the residual DCT coefBcients in a fiame is 
determined and the minimum number of bits, N, needed to represent the maximum 
value m the binary format is also determined. N is the number of biplanes layers for 
this fi-ame and is coded in the frame header. 

Within each 8 x 8 block is represent every one of the 64 (or 63) residual DCT 
coefficients with N bits in the binary format and there is formed N bit-planes or layers 
or levels. A bit-plane is defined as an array of 64 (or 63) bits, taken one from each 
residual DCT coefficient at the same significant position. 

The most significant bit-plane is determined with at least one non-zero bit and 
then the number of all-zero bit-planes between the most significant bit-plane 
determined and the Nth one is coded. Then starting from the most significant bit plane 
(MSB plane), 2-D symbols are formed of two components: (a) number of consecutive 
O's before a I (RUN), (b) whether there are any I's left on this bit plane, i.e. End-Of- 
Plane (EOP). If a bit-plane after the MSB plane contains aU O's, a special symbol 
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ALL-ZERO is formed to represent an all-zero bit-plane. Note that the MSB plane 
does not have the all-zero case because any all-zero bit-planes before the MSB plane 
have been coded in the previous steps. 

Four 2-D VLC tables are used, wherein the table VT-C-TabIe-0 corresponds to 
the MSB plane; table VLC-Table- 1 corresponds to the second MSB plane; table VLC- 
Table-2 corresponds to the third MSB plane; and table VLC-Table-3 corresponds to 
the fourth MSB and all the lower bit planes. For the ESCAPE cases, RUN is coded 
with 6 bits, EOF is coded with 1 bit. Escape coding is a method to code very small 
probability events which are not in the coding tables individually. 

An example of the above process will now follow. For illustration purposes, 
we will assume that the residual values after the zigzag ordering are given as follows 
and N = 6: The following representation is thereby produced. 

10, 0, 6, 0, 0, 3, 0, 2, 2, 0, 0, 2, 0, 0, 1, 0, ... 0, 0 

The ma x imum value in this block is found to be 10 and the minimum number of 
bits to represent 10 in the binary format (1010) is 4. Therefore, two all-zero bit-planes 
before the MSB plane are coded with a code for the value 2 and the remaining 4 bit- 
planes are coded using the (RUN, EOF) codes. Writing eveiy value in the binary 
format using 4 bits, the 4 bit-planes are formed as follows: 

1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0 (MSB-plane) 

0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0 (Second MSB-plane) 

1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0,0, 0 (Third MSB-plane) 

0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,0, 0 (Fourth MSB-plane or LSB-plane) 

Converting the bits of each bit-plane into (RUN, EOF) symbols results in the 
following: 



(0,1) 
(2,1) 

(0, 0), (1,0), (2,0), (1,0), (0, 0), (2, 1) 



(MSB-plane) 
(Second MSB-plane) 
(Third MSB-plane) 
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(5, 0), (8, 1) (Fourth MSB-plane or LSB-plane) 

Therefore, there are 10 symbols to be coded using the (RUN, EOP) VLC 
tables. Based on their locations in the bit-planes, different VLC tables are used for the 
coding. The enhancement bitstream using all four bitplanes looks as follows: 
code leading-all-zen>(2) 
code msb(0, 1) 
code msb-l(2,l) 

code-msb-2(0,0), code_msb-2(l,0), code-msb-2(2,0), code-msb-2(l,0), code-msb- 
2(0,0), code-msb-2(2, 1) code_msb-3(5,0), code_msb-3(8, 1). 

In an alternative embodiment, several enhancement bitstreams may be formed 
fiom the four bit-planes, in this example from the respective sets comprising one or 
more of the four bit-planes. 

Motion Vector Sharing 

In this alternative embodiment of the present invention motion vector sharing is 
capable of being utilized when the base feyer bitstream exceeds a predetermined size or 
more levels of scalability are needed for the enhancement layer. By lowering the 
number of bits required for coding the motion vectors in the base layer the bandwidth 
requirements of the base layer bitstream is reduced, hi base layer coding, a 
macroblock (16 x 16 pixels for the luminance component and W pixels for each chron- 
iuminance components) of the current frame is compared with the previous frame 
within a search range. The closest match in the previous frame is used as a prediction 
of the currem macroblock. The relative displacement of the prediction to the current 
macroblock, in the horizontal and vertical directions, is called a motion vector. 

The diflference between the current macroblock and if s prediction is coded 
using the DCT coding. In order for the decoder to reconstruct the current 
macroblock, the motion vector has to be coded in the bitstream. Since there is a fixed 
number of bits for coding a frame, the more bits spent on coding the motion vectors 
results in fewer bits for coding the motion compensated differences. Therefore, it is 
desirable to lower the number of bits for coding the motion vectors and leave more bits 
for coding the differences between the cun^nt macroblock and its prediction. 
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For each set of 2 x 2 motion vectors, the average motion vector can be 
determined and used for the four macroblocks. In order to not change the syntax of 
the base layer coding, four macroblocks are forced to have the identical motion 
vectors. Since only one out four motion vectors is coded in the bitstream, the amount 
of bits spent on motion vector coding is reduced, therefore, there are more bits 
available for coding the diflferences. The cost for pursuing such a method is that the 
four macroblocks, which share the same motion vector may, not get the best matched 
prediction individiially and the motion compensated difference may have a larger 
dynamic range, thus necessitating more bits to code the motion vector. 

For a given fixed bitrate, the savings firom coding one out of four motion 
vectors may not compensate the increased number of bits required to code the 
difference wi A a larger dynamic range. However, for a time varying bitrate, a wider 
dynamic range for the enhancement layer provides more flexibility to achieve the best 
possible usage of the available bandwidth. 

Coding gpgn Pfts 

In an alternative embodiment of the present invention, if the base layer 
quantized DCT coefficient is non-zero, the corresponding enhancement layer 
difference will have the same sign as the base layer quantized DCT. Therefore, there is 
no need to code the sign bit in the enhancement layer. 

Conversely, if the base layer quantized DCT coefficient is zero and 
corresponding enhancement layer difference is non-zero, a sign bit is placed into 
enhancement layer bitstream immediately after the MSB of the difference. 
An example of the above method will now follow. 

Difference of a DCT block after ordering 
- 10, 0, 6, 0, 0, 3, 0, 2, 2, 0, 0, 2, 0, 0, 1, 0, ...0, 0 
Sign indications of die DCT block after ordering 
. 3, 3, 3, 3, 2, 0, 3, 3, 1, 2, 2, 0, 3, 3, 1, 2, ... 2, 3 

- 0: base layer quantized DCT coefficient = 0 and difference >0 

- 1 : base layer quantized DCT coefficient = 0 and difference <0 

- 2: base layer quantized DCT coefficient = 0 and difference =0 
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-3: base layer quantized DCT coefficient = 0. 

In this example, the sign bits associated with values 10, 6, 2 don't need to be 
coded and the sign bits associated with 3, 2, 2, 1 are coded in the following way: 
Code(AUZero) 
code (All Zero) 
code(0,l) 
code(2,l) 

code(0,0),code(l,0),code(2,0),0,code(l,0),code(0,0),l,code(2,l),0 
code(5,0),code(8,l),l 

For every DCT difference, there is a sign mdication associated with it There 
are four possible cases. In the above coding 0, 1, 2, and 3 are used to denote the four 
cases. If the sign indication is 2 or 3, the sign bit does not have to be coded because it 
is either associated with a zero difference or available from the corresponding base 
layer data. If the sign indication is 0 or 1 a sign bit code is required once per difference 
value, i.e. not eveiy bit-plane of the difference value. Therefore, a sign bit is put 
immediately after the most significant bit of the difference. 

Optimal Reconstnigtion of the DCT CoefTicieiits 

In an alternative embodiment of the present invention, even though N 
enhancement bitstream layers or planes may have been generated, only M, wherein M 
< N enhancement layer bits are available for reconstruction of the DCT coefficients 
due to the channel capacity, and other constraints such as congestion among others, 
the decoder 80 of Fig. 1 may receive no enhancement difference or only a partial 
enhancement difference. In such a case, the optimal reconstruction of the DCT 
coefficients is capable of proceeding along the following method: 

If decoded difference = 0, the reconstruction pomt is the same as that in base 
l^er, otherwise, the reconstructed difference = decoded difference + 
♦(l«decoded_bitjlane) and the reconstruction point = reference point + 
reconstructed difference ♦ Q^enh +Q_enh/2. 

In the present embodiment, referring to Figs. 3C and 3D, the optimal 
reconstruction point is not the lower boundary of a quantization bin. The above 
mefliod specifies how to obtain the optimal reconstruction point in cases where the 
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difference is quantized and received partially, i.e. not all of the enhancement layers 
generated are either transmitted or received as shown in Fig. 1 . wherein M < N. 
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What is claimed is: 

1 . A video encoding method for adapting a video input to a bandwidth of a 
transmission channel of a network, the method comprising the steps of: 

determining number N of enhancement layer bitstreams enable of being 

adapted to said bandwidth of said transmission channel of said network; 

encoding a base layer bitstream fix)m said video input; 

encoding N number of enhancement layer bitstreams from said video 

input based on the base layer bitstream, wherein the 

N enhancement layer bitstreams complements the base layer bitstream; and 

providing the base layer bitstream and N enhancement layer bitstreams to said 
network. 

2. The video encoding method according to claim 1 , v^erein the 
determining step includes negotiating with intermediate devices on said 
network. 

3. The video encoding method accordmg to claun 2, wherein 
negotiating includes determining destination resources. 

4. The video encoding method according to claim 1, wherein the step of 
encoding the base layer bitstreams is performed by a MPEG-1 encoding 
method. 

5. The video encoding method accordmg to claim 1 , wherein the step of 
encoding the base layer bitstreams is performed by a MPEG-2 encoding 
mefliod. 

6. The video encoding method according to claim 1 , wherein the step of 
encodmg the base layer bitstreams is performed by a MPEG-4 encodmg 
method. 
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7. The video encoding method according to claim 1 , wherein the step of 
encoding the base layer bitstreams is performed by a Discrete Cosine 
Transform (DCT) method. 

8. The video encoding method according to claim 7, wherein after 
encoding the base layer bitstreams by a Discrete Cosine Transform (DCT) 
method a DCT coefficient is quantized. 

9. The video encoding method according to claim 1 , wherein the enhancement 
layer bitstreams are based on the difference of an original base layer DCT 
coefficient and a corresponding base layer quantized DCT coefScient 

10. The video encoding method according to claim 1 , wherein the base 
layer bitstream and the N enhancement layer provide to the network are 
multiplexed. 



11. A video decoding method for adapting a video input to a bandwidth of a 
transmission channel of a network, the method comprising the steps of: 

determining number M of enhancement layer bitstreams of said video input 
capable of being received from said transmission channel of said 
network; 

decoding a base layer bitstream from received video input; 

decoding M number of enhancement layer bitstreams from the received video 
input based on the base layer bitstream, wherein the M received 
enhancement layer bitstreams complements the base layer bitstream; 

and 

reconstructinjg the base layer bitstream and N enhancement layer bitstreams. 

12. The video decoding method according to claim 1 1 , wherein the 
determining step includes negotiating with intermediate devices on said 
network. 
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1 3. The video decoding method according to claim 12, wherein 
negotiating includes determining destination resources. 

14. The video decoding method according to claun 1 1, wherein the stqj of 
decoding the base layer bitstreams is performed by a MPEG-1 decoding 
method. 

1 5. The video decoding method according to claim 1 1 , wherein the step of 
decoding the base layer bitstreams is performed by a MPEG-2 decoding 
method. 

1 6. The video decoding method according to claim 1 1 , wherein the step of 
decoding the base layer bitstreams is performed by a MPEG-4 decoding 

^ method 

1 7. The video decoding method according to claim 1 1 , wherein the step of 
decoding the base layer bitstreams is performed by a Discrete Cosme 
Transform (DCT) method. 

1 8. The video decoding method according to claim 1 7, wherein after 
decoding the base layer bitstreams by a Discrete Cosine Transfonn (DCT) 
method a DCT coefficient is unquantized. 

1 9. The video decoding method according to claim 1 1 , wherein coding of the 
enhancement layer bitstreams are based on the difference of an original base 
hiyer DCT coefficient and a corresponding base layer quantized DCT 
coefficient 

20. The video decoding method according to claim 1 1 , wherem the base 
layer bitstream and the M enhancement layers to be reconstructed are de- 
multiplexed 
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21. A video decoding method for adapting a video input to a bandwidth of a 
receiving apparatus, the method comprising the steps of: 

demultiplexing a base layer bitstream and at least one of a plurality of 
enhancement layer bitstreams received from a network; 
decoding the base layer bitstream; 

decoding at least one of the plurality of enhancement layer bitstreams based 

on generated base layer bitstream, wherein the at least one of the plurality of 
enhancement layer bitstreams enhances the base layer bitstream; and 
reconstructing a video output. 

22. A video encoding method for encoding enhancement layers based on a base 
layer bitstream encoded from a video input, the video encoding method comprising the 
steps of: 

taking a difference between an original DCT coefficient and a reference point; 

and 

dividmg the difference between the original DCT coefiBcient and the reference 
point into N bit-planes. 

23. The video encoding method according to claim 22, wherein RUN and EOP 
symbols represents the N bit-planes of a DCT block. 

24. The video encoding method according to claim 23, wherem the RUN and EOP 
symbols are encoded. 

25. The video encoding method according to claim 24, wherein a sign bit is 
encoded if the DCT difference is equal to zero or the sign of the DCT difference is the 
same as the corresponding base layer bitstream data 
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26. A video decoding method for reconstructing DCT coeflBcients M enhancement 
layers of N enhancement layers have been received, wherein M < N, comprising: 

means for taking a reconstruction difference as a decoded difference and a 
portion of a decoded bit-plane; 

means for taking a reconstruction point as a reference point and a 
reconstructed difference; and 
determining an optimal reconstruction point 

27. A method of coding motion vectors of a plurality of macroblocks, the method 
comprising the steps of: 

determining an average motion vector from N motion vectors for N 
macroblocks; 

utilizing the determined average motion vector as the motion vector for the N 

macroblocks; and 
encoding 1/N motion vectors in a base layer bitstream. 
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